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Abstract. We provide several algorithms for the exact, uniform random sam¬ 
pling of Latin squares and Sudoku matrices via probabilistic divide-and-conquer 
(PDC). Our approach divides the sample space into smaller pieces, samples each 
separately, and combines them in a manner which yields an exact sample from 
the target distribution. We demonstrate, in particular, a version of PDC in 
which one of the pieces is sampled using a brute force approach, which we dub 
almost deterministic second half, as it is a generalization to a previous appli¬ 
cation of PDC for which one of the pieces is uniquely determined given the 
others. 
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1 Introduction 


The random sampling of combinatorial structures is an active topic, with many available 
techniques. Several general methods have emerged which supply efficient algorithms in many 
cases of interest, e.g., rejection sampling [30], Boltzmann sampling m. Markov chains m 
ED] EH ED- Each method presents difficulties: rejection sampling provides exact samples 
in hnite time, but that finite time is often too large to be useful in practice; Boltzmann 
sampling produces an object which does not always satisfy the restrictions; rapidly mixing 
Markov chains are often easily fashioned to a problem, along with bounds on the mixing time 
(though not always), but they are notably inexact in finite time; and while Markov chain 
coupling from the past is an exact sampling method, it requires constructing a coupling over 
the complete set of objects, which can be practical when there is some type of monotonicity. 

There are many effective approaches to random sampling from the set of 9 x 9 Sudoku 
matrices, which is defined as the collection of 9 x 9 integer-valued matrices such that each 
row and column is a permutation of the set {1,..., 9}, and certain 3x3 sub-blocks (see flT]) ) 
are also permutations of {1,... ,9}. An efficient backtracking algorithm is utilized in [6], 
though no quantitative bounds are given for the bias. Another approach is to fashion a 
Markov chain with uniform stationary distribution; this was described in [15], although it 
was not proved to be rapidly mixing. Importance sampling delivers a collection of Sudoku 
matrices as well as weights which allows one to calculate unbiased estimates of statistical 
parameters; this was done in [25] . 
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We present several algorithms which are an application of probabilistic divide-and-conquer 
(PDC) [2]. The central idea is the division of the sample space into two smaller parts, each 
of which can be sampled separately, and then appropriately pieced back together to form 
an exact sample. The method is versatile enough to be adapted to other types of numerical 
tables, and we have chosen the set of Latin squares as a natural complementary structure; 
a Latin square of order n is an n x n integer-valued matrix such that each row and column 
is a permutation of {1,..., n}. There is also a generalization of Sudoku matrices which can 
be sampled using PDC, presented in Section 16.11 

There are many ways to implement PDC, and in this paper we implement a version called 
almost deterministic second half, which divides the sample space into two pieces, one of which 
can be sampled efficiently using a brute force approach. There are other PDC algorithms: 
self-similar PDC (see [21 Section 3.5]) often provides an asymptotically efficient algorithm, 
but requires detailed information about the sample space; PDC with deterministic second 
half (see [21 Section 3.3], [5]) requires almost no information about the sample space, and 
is such that given one of the pieces, the second piece is uniquely determined; PDC with 
the recursive method (see m) is such that given one of the pieces, the second piece can be 
efficiently sampled using the recursive method [221123]; the advantage of our current approach 
is that it can be customized to suit the available knowledge of the sample space. When we 
do have detailed knowledge about the sample space, e.g., the number of ways of completing 
a partially generated object, as is the case for Sudoku matrices, we show how one can more 
optimally design a PDC algorithm. 

The random sampling of Latin squares and Sudoku matrices is a notoriously difficult prob¬ 
lem, and we can only claim in the present work to have made modest improvements to 
existing algorithms. PDC offers a versatile method for approaching the problem, and al¬ 
lows the fashioner to design an algorithm tailored to the particular computational aspects 
of the problem available. In addition, our algorithms do not require extensive auxiliary 
constructions or complicated transformations. 

The paper is organized as follows. In Section [2] we give the definitions of a Latin Square 
of order n and a Sudoku matrix. In Section [3l we introduce PDC and present some simple 
rejection sampling algorithms. In Section 0] we present a PDC algorithm for the random 
sampling of Sudoku matrices, an analysis of the cost, and some alternative PDC parameter- 
izations. We do the same for Latin squares of order n in Section [5l In Section [6] we review 
other approaches, and in Section [7| we explain our initial motivation and provide a link to a 
code implementation in C-I--I-. 


2 Definitions 


For any n > 1, a Latin square of order n is an n x n matrix which satishes the following row 
and column constraints, heretofore referred to as the Latin square conditions: 

• each row, labelled i?i,..., is a permutation of {1, 2,..., n}; 
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• each column, labelled Ci,..., is a permutation of {1, 2 ,..., n}. 

We let LSn denote the set of all Latin squares of order n. 

By a partially completed matrix, e.g., partially completed Latin square, we mean a matrix 
that has some subset of entries filled in which do not a priori violate the conditions. No 
assumption is implied in general as to whether such a matrix can be completed. We shall 
primarily be interested in partially completed matrices whose first k rows are filled in, al¬ 
though there are several possible applications of PDC which could be exploited in more 
general circumstances, see Section 15.31 A partially completed Latin square of order n with 
the first k rows filled in is called & k x n Latin rectangle, for which much is already known, 
see for example [16] . 

A Sudoku matrix is a Latin square of order 9 which also satisfies the additional block con¬ 
straints: there are nine 3x3 sub-blocks, labelled Bi, B2,..., Bg, each of which is a permu¬ 
tation of {1, 2,... , 9}; these blocks are indicated below. 



B2 

CO 

B, 

B, 

B, 

B-j 

Bs 

Bg 


( 1 ) 


We refer to these three constraints - row, column and block - as the Sudoku conditions, and 
we let S denote the set of all Sudoku matrices. 

In the calculations that follow, a ~ fe is simply the elementary meaning where a and/or b 
have been rounded to some nearest value. For a finite set A, the notation |A| will denote 
its cardinality, i.e., the number of elements in the set A. In addition, we introduce notation 
for asymptotic analysis: suppose f{n) and g{n) are positive functions of a positive integer 
n, then 

• we say f{n) = 0{g{n)) if there exists a C > 0 such that < C for all n larger than 
some uq; 

• we say f{n) = o{g{n)) if —)■ 0 as n tends to inhnity; 

• we say f{n) ~ g{n) if lim^^oo §§ = 1- 
For real-valued x, let 

• [x] denote the integer closest to x] i.e., x rounded to the nearest integer; 

• [xj denote the largest integer smaller or equal to x, also known as the floor of x. 
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3 Probabilistic divide-and-conquer 


We begin by highlighting various simple algorithms for obtaining an element of S, a Sudoku 
matrix, uniformly at random (u.a.r.) over all possibilities. Each of the approaches can be 
adapted to sampling an element of LSn in a straightforward manner, and so we specialize to 
S for concreteness. 

1. Sample each entry independent, and identically distributed (i.i.d.) uniform over the set 
{1,2 ,..., 9}; restart if any of the Sudoku conditions are not satished. 

2. Sample each row (or column or block) as i.i.d. uniform over the set of all permutations 
of (1, 2 ,..., 9}; restart if any of the Sudoku conditions are not satished. 

3. Let the hrst row be (1, 2,..., 9). Sample rows 2 through 8 as i.i.d. uniform over the set 
of all hxed-point free permutations of (1, 2,..., 9}; if row 9 can be completed, with all 
Sudoku conditions satished, then complete it, and hnally apply a random permutation 
of (1,..., 9} to all entries and return the matrix; otherwise, restart. 

These algorithms are generally known as rejection algorithms, since the target set of objects 
S, also referred to as the target region, lies within the complete sample space 11. Since the 
algorithm generates a sample uniformly over all elements of fl, any Sudoku matrix generated 
by such a hard rejection algorithm is also uniform over S (see, e.g., [32] for a more detailed 
analysis in a more general setting). 

The expected number of times one must sample elements of 11 before obtaining a sample 
from S is simply the quotient |11|/|S'|. It was shown in [TT] that the number of Sudoku 
matrices is exactly 


1^1 = 6670903752021072936960 6.67 x 


This number is too large to simply list all of the elements and select one at random. 

There are 9®^ ~ 2.0 x 10^^ matrices in Hi := {1,..., 9}®^®, so for the hrst, most naive 
approach of i.i.d. entry sampling, the expected number of times one must sample from Hi 
is |Hi|/|S'| ~ 3.3 X 10^®. There are IH2I := (g) ~ 3.0 x lO^'^ diherent ways of selecting 9 
distinct permutations of {1,2,..., 9}, whence, |H2|/|S'| 4.5 x 10^^. 


The third algorithm is the most respectable of the three, as it eliminates two of the rows 
and reduces the number of possible permutations from 9! to [9!/e]. However, since we are 
hxing the hrst row, we can only sample from those elements of S which also have the same 
hrst row; thus, we must normalize by |-S'|/9! instead of [S’!. We have H3 := {the set of all 
collections of seven hxed-point free permutations of {1,2,..., 9}}, hence. 


l^sl 

|5|/9! 


(9!/e)^ 

1.8 X 1016 


4.2 X 10^®. 


None of these rejection rates is practical. 
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At this stage, there are many different ways to proceed. One approach is to relax the demand 
that the distribution on elements of S is exactly uniform, and instead adopt algorithms which 
are faster but not guaranteed to be uniform. Another approach is to take advantage of more 
detailed properties of the elements in S, which we now explore using PDC. 

Rejection sampling algorithms can very often be improved, with very little extra information 
known about the target region, using PDC [2]. We assume a decomposition of the sample 
space D of the form Q = A x B, where elements in D can be sampled using a distribution 
on A, and a distribution on B, such that random variables A ^ A and B E B are mutually 
independent; we denote their distributions by C{A) and C{B), respectively. In addition, we 
assume the target distribution on S' C D, denoted by C{S), is of the form 

CiS)=C{{A,B)\h{A,B) = l), 

where h : Ax B ^ {0,1} is some (measurable) functional which determines whether or not 
the pair [A, B) lies in the target set. The PDC Lemma [21 Lemma 2.1] states that, assuming 
P(h(A,R) = 1) > 0, we have £(S) = C{X,Y), where 

C{X) = C{A\h{A, R) = 1), CiY\X = x) = C{B\h{x, B) = l). 

An algorithm to sample from C{S) is then as follows (see [21 Algorithm 2]): 

1. Generate sample from C{A \ h{A, B) = 1), call it x. 

2. Generate sample from C{B \ h{x, B) = 1), call it y. 

3. Return {x,y). 

Often, however, the conditional distributions are not known, and so the more practical PDC 
algorithm utilizes von Neumann’s rejection sampling approach [30], which allows us to sample 
from the conditional distribution C{A\h{A,B) = 1) using C{A) and a biased coin. 


Algorithm 1 [2] Probabilistic Divide-and-Conquer via von Neumann 


1: Fix any a such that maxP(h(a,R) = 0) < a < 1. 

a&A 


2 

3 

4 

5 

6 


■n 1 A ^ n / \ P(/i(a, B) = 1) 

For each a E A, define t(a) := -. 

a 

Generate sample from C{A), call it a. 

Reject a with probability 1 — t(a); otherwise, restart from Line El 
Generate sample from C{B \ h{a,B) = 1), call it y. 

Return {a,y). 


Presently, in order to sample directly from the conditional distribution C{B \ h{a,B) = 1), 
we choose A such that C{B \ h{a, B) = 1) can be sampled effectively using a brute force 
approach; we call this PDC with almost deterministic second half. Also, since the rejection 
probability t{a) must satisfy, for some 0 < a < 1, 


t{a) 


P(/i(a, B) 
a 



for all a E A, 
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the optimal choice of a is maXaP(h(a, 5) = 1 ), but the algorithm is still valid and unbiased 
for any a larger than this value. Thus, there are many ways to apply PDC, starting with an 
appropriate selection of sets A and S, and it suffices to choose any universal upper bound 
of P(/i(a, -B) = 1), where greater efficiency is achieved by selecting the optimal value of a. 


4 Sudoku matrices 

4.1 A PDC algorithm 

We now describe an algorithm for Sudoku matrices which takes advantage of Algorithm [ 1 ] 
Given the first three rows Ri,R 2 ,R 3 of a partially completed Sudoku matrix, let Pg = 
P 9 (Pi, R2, P3) = B 2 , B 3 ) denote the set of all possible permutations of { 1 , 2 ,..., 9} 

which would not violate the Sudoku conditions if placed in row four. Also, we denote by 
U a random variable with the uniform distribution over the interval (0,1), independent of 
all other random variables, and each occurrence of u in an algorithm means to generate a 
random variate from the distribution of U. 


Algorithm 2 PDC Sudoku algorithm. 

1 : Let Pi = (1,2,..., 9). 

2 : Select P 2 , P 3 in proportion to the number of completable Sudoku matrices. 

3: Generate (P4, P5, Rq, P7), each an i.i.d. uniformly random element from Pg. 

4: Let d denote the number of possible completions given (Pi, P 2 , P 3 , P 4 , P 5 , Pe, P 7 ). 
5: if M < A, then 

6 : Select a completion uniformly at random from the d possible completions. 

7: else 
8 : Goto 3. 

9: end if 

10 : Apply a random permutation to (G4, G5, Gg). 

11 : Apply a random permutation to (G7, Gg, Gg). 

12 : Apply a random permutation to ((G4, G5, Gg), (G7, Gg, Gg)). 

13: Apply a random permutation of {1, 2,..., 9} to the entries and return. 


Line 2 demands some further explanation. By considering various symmetries in the per¬ 
mutations of columns, it was shown in [H] that there are only 36288 essentially unique 
completions to blocks P2 and Psj^ and, for each configuration of blocks Pi, P2, P3, there 
is a certain number of completable Sudoku matrices, which was calculated via brute force. 
A downloadable hie containing these enumerations is available online, see [19]; the hrst few 
lines look like the following: 

[456789,789123,123456] => 108374976 

^This number can be simplified to 71 after taking more symmetries into account m- 
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[456789,789123,123465] 

[456789,789123,123546] 

[456789,789123,123564] 

[456789,789123,123645] 


=> 102543168 
=> 102543168 
=> 100231616 
=> 100231616 


The list of three 6-tuples represent columns 3 through 9 of the hrst three rows, and the 
integer value at the end of the arrow counts the number of completable Sudoku matrices 
given these hrst three rows. For example, the text hie says that the matrix with top three 
rows given by 


1 

2 

3 

4 

5 

6 

7 

00 

9 

4 

5 

6 

7 

8 

9 

1 

2 

3 

7 

8 

9 

1 

2 

3 

6 

4 

5 


has precisely 100231616 completions of the bottom 6 rows which are valid Sudoku matrices. 
Line [2] is thus a straightforward sampling of the entries in B 2 and i? 3 , in proportion to 
the number of completable Sudoku matrices. There are two reasonable ways to perform this 
sampling. Denote by Xi the number of possible completions given outcome i, i = 1,..., 36288. 
One can normalize by the sum of all the completions and generate a value using the discrete 
probability distribution {xi/ fhis is not so ideal in general due to boating 

point considerations. Instead, we apply PDC in our implementation by sampling uniformly 
from among the 36288 possibilities hrst, say we select outcome i, and we accept this sample 
with probability Xi/ max^ Xj. 

Line [6] is then the last task of the algorithm, and for this we have implemented a brute force 
approach which enumerates through all possibilities and picks one uniformly at random. 
Specihcally, each column has a set of two elements {a*, 6*}, i = 1,..., 9, which must be used 
in that column in the hnal two rows, in some order, and we are able to greatly reduce the 
enumeration of all a priori 2® = 512 possibilities by taking various symmetries into account; 
see for example the proof of Lemma 14.11 


4.2 Cost of Algorithm [2] 

Lemma 4.1. Assume the first seven rows of a Sudoku matrix have been filled in, and none 
of the Sudoku conditions are violated based on these first seven rows. Then the total number 
of possible completions of this matrix which also do not violate the Sudoku conditions is at 
most 16. This bound cannot be improved. 

Proof. WLOG, assume the last two rows of column j can be completed by {2j — 1, 2j} for 
j = 1, 2, 3. We shall denote this by the following table 


1 

3 

5 







2 

4 

6 








By interchanging 2j — 1 and 2j for j = 1, 2, 3, we obtain 2^ possible combinations. For the 
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elements 7, 8, 9, there are two possibilities: either 7 and 8 share the same column, or they 
do not. 


• In the case where they share the same column, we have WLOG 


1 

3 

5 

7 

9 





2 

4 

6 

8 

a 






which implies another factor of 2 by exchanging the role of 7 and 8; however, whichever 
element is paired with the 9, say a = 5, determines which row the 9 lies in, and so 
there are at most 16 possible completions. 

• In the case where 7 and 8 do not share the same column, whichever element the 7, 8, 
and 9 are paired with uniquely determine their row, and so there are at most 8 possible 
completions in this case. 

Thus, there are at most 2‘^ = 16 possible completions given the hrst seven rows of a partially 
completed Sudoku matrix. 


The following matrix (in reduced form) was generated by Algorithm [2l The hrst seven rows 
of this matrix allow for 16 potential completions, thus, 16 is a tight upper bound. 


1 

2 

3 

4 

5 

9 

6 

7 

8 

4 

5 

6 

7 

8 

2 

3 

1 

9 

7 

8 

9 

1 

6 

3 

2 

5 

4 

9 

4 

8 

2 

1 

7 

5 

6 

3 

5 

7 

2 

3 

4 

6 

8 

9 

1 

6 

3 

1 

5 

9 

8 

4 

2 

7 

3 

6 

7 

9 

2 

4 

1 

8 

5 

00 

9 

5 

6 

3 

1 

7 

4 

2 

2 

1 

4 

8 

7 

5 

9 

3 

6 


□ 


We now justify Line 2 of Algorithm |2l In [TT], the total number of Sudoku matrices is found 
by hrst reducing by symmetry the total number of possible completed hrst three rows; the 
number given is 36288. For each of these 36288 conhgurations of the hrst three rows, a brute 
force enumeration is performed for the number of possible Sudoku matrices that could be 
completed. 

Lemma 4.2 ([H]). The table of 36288 possible configurations of B 2 and B 3 contains, for 
each configuration, the number of possible Sudoku matrices that can be completed given the 
first three rows. These numbers all lie between 94888576 and 108374976. 

The fact that these values are relatively constant has a number of quantitative and qualita¬ 
tive interpretations. From a rejection sampling perspective, it means that we could sample 
a conhguration uniformly at random and then reject each generated conhguration in pro¬ 
portion to the number of completions, normalized by the maximum possible, i.e., 108374976. 
Thus, the probability of rejecting a sample is at worst 1 — 94888576/108374976 ~ 0.12444. 
Practically, this means that sampling of the hrst three rows uniformly from among the 36288 





























possible choices and applying a rejection is efficient. Also, it says that each conhguration 
is approximately as completable as any other conhguration; i.e., accepting any completable 
conhguration of the hrst three rows introduces a small bias. 


Thus, the main cost of the algorithm is the rejection sampling of rows 4 through 7. Through 
a complete enumeration of all 36288 cases, the number of permutations in Pg is always 
between 12000 and 12096. There are thus at most 12096^ 2 x 10^® diherent combinations 

of permutations that can be placed in rows 4 through 7, but only about 10® valid completions 
to a Sudoku matrix are possible. This means that on average we must sample about 2 x 10® 
4-tuples before we get lucky and obtain a quadruplet that yields a completable Sudoku 
matrix. Even after we obtain a completable quadruplet, however, we must still survive the 
hnal rejection in LineO which comes from the fact that different quadruplets yield a different 
number of completable Sudoku matrices. 


Lemma 4.3 ([S!)- In a rejection sampling algorithm, let f denote the target distribution, 
and g denote the sampling distribution; then the rejection probability, see for example Line\^ 
in AlgorithmUl for exact sampling from distribution f is given by 

t{a) = C a E A, 

9W 


where the optimal value for C is given by 


C* := ( 


max 


/(a) 


V “ 9{a) 


-1 


The algorithm is unbiased for any C < C*. Algorithms which utilize rejection sampling have 
a geometrically distributed number of generations before acceptance, with expected value given 
by i. 

Theorem 4.4. Algorithm [H samples uniformly over the set of all Sudoku matrices. The 
number of times to sample blocks B 2 and P 3 by choosing one of the 36288 possible choices 
u.a.r. and using rejection sampling is geometrically distributed with expected value 

36288 ■ 108374976 , ^ 

- 1 . 1 . ( 2 ) 

3546146300288 ^ ^ 

The number of times to sample rows P 4 , P 5 , Rq, Rj is geometrically distributed with expected 
value at least 3.0 x 10® and at most 3.6 x 10®. 


Proof. The uniformity of the algorithm follows by two applications of probabilistic divide- 
and-conquer given by Algorithm [T] We now calculate the precise rejection probabilities. 

The hrst application of PDC is given by the division 


A — (Pi, P2, P3), P — (P4, R5, R&i R7, Rs, Pg)- 

The rejection probabilities t{a) are given by the number of possible completions normalized 
by the maximum number of possible completions; these values are from Lemma 14.21 We 
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index each of the possible conhgurations of B 2 and -83 by i = 1, 2,, 36288. Then we select 
one uniformly at random, hence g{i) = 1/36288, but the distribution desired is the one that 
weights each conhguration in proportion to the number of possible completions, denoted by 
Xi previously, which is contained in the table of values computed in m- That is. 


/(*) 


Xj _ 

3546146300288’ 


i = 1,..., 36288, 


where 3546146300288 = 'Yhj is the total number of possible completions by all elements 
in the table, which makes f{i) a probability distribution. The expected number of times to 
sample from blocks B 2 and B 3 is thus 


1 / fU)\ 36288 • (maxiXi) 

— = max —— = -, 

C V * / 3546146300288 ’ 


where 


maxxi = 108374976 


was computed by brute force over the set of all possibilities. We have just proved ([2]). 


The second application of PDC is the division 


A — ( 84 , i? 5 , Rq, Rj), B — {Rs, Rg). 

The rejection probability t{a) is given by the number of possible completions (calculated via 
brute force) given a = ... ,r 7 ), normalized by 16, the optimal upper bound provided by 

Lemma 14.11 Rather than sample from the set of all permutations of {1,..., 9}, since we 
have already accepted the hrst three rows of the matrix, we can automatically discard any 
permutations which would violate the Sudoku conditions. Thus, we sample these four rows 
from the set Pg, which depends on the particular conhguration of B 2 and B^ accepted in the 
hrst part. By brute force calculation over all 36288 possible hrst three rows, we have found 
that 12000 < \P^\ < 12096. Thus, 

1 / N 1 1 r „ 

- 7 < Q{i) = — 777 < - 7, for all ?. 

120964-^^-^^ |P/|4 - 12000^ 

Similarly, we have 


8 

108374976 


< fU) 


^ completions given hrst seven rows ^ 16 

4^ completions given hrst three rows ~ 94888576 


Whence, 


3.0 X 10® 


12000^ X 16 
108374976 



( max 

V J 



< 


12096^ X 16 
94888576 


^ 3.6 X 10®. 


□ 
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4.3 Alternative PDC parameterizations for Sudoku matrices 

There are two simple alternative approaches to Algorithm [ 2 ] that require minimal modih- 
cation of the original algorithm; we indicate the replacement for lines [3H5] below in each 
case. 

The hrst approach is to randomly sample through 

3: Generate (i? 4 , i? 5 , Rq, R^, Rg), each an i.i.d. uniformly random element of Pg. 

4: Let d denote the number of possible completions given (Pi, R 2 , Rg, R 4 , R 5 , Rq, Rj, Rg)■ 

5: if d = 1, then 

That is, conditional on the existence of a completion to Pg, we may simply accept the 
sample P 4 through Pg and hll in the unique completion to Pg; this variation is called PDC 
deterministic second half [2], see also [Ej. (In general, PDC deterministic second half also 
requires a rejection step, but in the case when £(P | h{a, B) = 1) is a uniform distribution 
over a singleton set for each a G A which can be used to complete a sample, all samples are 
rejected with probability 0.) This approach increases the probability of rejection in Line [5] 
considerably, see Proposition 14.51 below, but eliminates the task in Line [ 6 l 

Proposition 4 . 5 . Suppose Tme 0 in Algorithmic instead samples rows P4 through Pg. Then 
the expected number of rejections until the first eight rows are a partially completed Sudoku 
matrix is within the interval 

12000® 12096® 

108374976’94888576 

Then, once these first eight rows are a partially completed Sudoku matrix, the probability of 
rejection is 0 , and the last row is uniquely determined. 

The proof of Proposition 14.51 is straightforward, with upper bounds on the number of pos¬ 
sible completions provided by Lemma 15.11 and the rejection probability provided by [El 
Theorem 7.1]. 

Alternatively, we can instead randomly sample P 4 through Pg, and reject these three rows 
in proportion to the number of possible completions of the last three rows, indicated below: 

3: Generate {R 4 , R 5 , Rq), each an i.i.d. uniformly random element from Pg. 

4: Let d denote the number of possible completions given (Pi, P2, P3, P4, P5, Pg). 

5: if M < ^^ 4 . 5 g 3 j , then 

From a theoretical point of view, this approach is more ideal, since the probability of rejection 
in LinelElis smaller (since d is much less likely to be 0 ). However, completing the sample as in 
Line| 6 ]can also be a bottleneck, since now there are a priori 3® ~ 2 x 10^ possible completions 
that must be checked, even though we can greatly reduce this number by considering various 
symmetries. 

Proposition 4 . 6 . Suppose LinelEin Algorithmic instead samples rows P4 through Pg. Then 


^ [2.3,2.73] X 
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the expected number of rejections is within the interval 

' [24-5 63J 12000^ [24-5 6^J 12096=^ 

108374976 ’ 94888576 

Then, once these first six rows are determined, the last three rows can be completed in at 
most [2^'®6^J ways. 

The upper bound on the number of possible completions in Proposition 14.61 is borrowed from 
the theory of Latin squares, see Section 15.21 specihcally Lemma 15.11 using the values n = 9 
and k = 6. 

Despite the much smaller expected number of rejections in Proposition 14.61 the reason why 
we champion Algorithm |2] presently is due to our brute force implementation of sampling 
from C{B \ h{a, B) = 1), which is optimized to enumerate between all potential completions 
using fast bit-wise operations in C++, whereas in the more general case we do not have this 
option encoded. We emphasize that while we have not yet efficiently coded this algorithm 
for more general cases, should such a module be coded efficiently it could very easily be 
faster in practice than Algorithm [2l 

In addition, it may be possible to improve upon the upper bound of |_2^'®6^J in this special 
case. To help facilitate such an endeavor, we give a motivating example. The upper bound in 
Lemma [5+] for the number of completable (n—3) xn Latin rectangles is 2"^/^ 6”/^. With n = 9, 
we have [2"'/^ 6”/^J = 4887. However, the largest observed value for partially completed 
Sudoku matrices in a sample of size 1000 was 288. If, in fact, 288 is the smallest upper 
bound, and was used instead of 4887, then the run-time of the algorithm which samples all 
but the hnal three rows would be reduced by a factor of about 17. 


^ [7.8, 9.1] X 10^. 


4.4 Further reduction by symmetries 

When various symmetries are taken into account, one can reduce the total number of Sudoku 
matrices to 5472730538 ~ 5.4 x 10® essentially different Sudoku matrices, see [26]. This 
number is certainly more practical, and a comprehensive list of such matrices could be stored 
for random access if the memory was available, thus offering an algorithm which is 0(1) in 
terms of our costing of random generation; of course, one would also have to implement 
the various transformations, but this cost is certainly not prohibitive. The advantage with 
our approach is that its memory requirements are entirely practical, even for a computer of 
modest means, and can be generalized in a straightforward manner. Nevertheless, should 
Algorithm [2] be deemed not efficient enough, a more efficient random sampling algorithm 
may be achievable by taking these symmetries into account. 
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Latin squares 


5.1 A PDC algorithm 

The upper bound in Lemma l4T] is specialized to the set of Sudoku matrices, and in general 
such tight bounds are not available. Nevertheless, we present a similar PDC algorithm in 
this section for Latin squares of order n for any n > 5. 


Algorithm 3 PDC Uniform sampling from LSn 

1: Let = (1,..., n). 

2: for i = 2,... ,n — 3 do 

3: Generate Ri uniformly from the set of fixed-point free permutations of {1,..., n}. 

4: end for 

5: Let d denote the number of possible completions given (i?i,..., Rn-s)- 

6: if U n n - then 

[6^2^J 

7: Select a completion uniformly at random from the d possible completions. 

8: else 
9: Goto 2. 

10: end if 

11: Apply a random permutation to the rows of the matrix. 

12: Apply a random permutation of {1,..., n} to the entries and return. 


As in Algorithm 121 the selection of a completion in Line [7] once the hrst set of rows has been 
accepted can be performed using brute force enumeration, or any alternative method which 
uniformly samples from the set of possible completions. 


5.2 Cost of Algorithm [3] 

We now discuss the parameterized algorithm for Latin squares of order n. There are two 
competing styles of analysis: one champions an analysis for a particular, finite value of n 
chosen in advance, e.g., n = 9 as in the previous section; the other is an asymptotic analysis 
of how the algorithm scales as n tends to inhnity. Whereas our previous analysis for Sudoku 
matrices was specific to a fixed size 9, we shall now be primarily interested in how Algorithm[3] 
scales as n tends to infinity. 

For a partially completed Latin square of order n with the hrst k rows hlled in, a more 
general bound for all n and k can be obtained using upper bounds on the permanent of an 
n X n matrix taking values in the set {0,1}, as was done in [29]. We now adapt the argument 
in our setting. 

Lemma 5.1. Suppose the first k rows of any Latin square of order n have been filled in, 
k < n, without violating any of the Latin square conditions. 
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1. There always exists at least one completion to a full Latin square of order n, and when 
k = n — 1 there is exactly one completion. 

2. Let Cn,k denote the maximum possible number of completions to a full Latin square of 
order n given these first k rows. Then we have 

n—k 

(3) 

e=i 


Proof. First, the existence of at least one completion of a partially completed Latin square 
is given by [221 Theorem 17.1], and the uniqueness of the completion when k = n — 1 follows 
by taking k = n — 1 in inequality ([3]). 

The remaining argument is the same as [29l Proof of Theorem 17.3]. That is, we let B 
denote the n xn matrix where the (i, j)th entry is 1 if element i does not appear in the first 
k rows of column j, and 0 otherwise. Thus, the row sums of B are all n — k, and so by m 
Theorem 11.5], we have 

perB < (n- 

where per B denotes the permanent of the matrix B, which is also the number of ways 
of completing the next row. Thus, by applying the above inequality n — k — 1 times, for 
completing rows k + 1, k + 2,.. .n — 1, we obtain 

n—1 

e=k 


We shall also need for our analysis of asymptotic runtimes to estimate the magnitude of 
\LSn\, which is still not precisely known asymptotically as n tends to inhnity, although the 
following asymptotic result is contained in [29l Theorem 17.3]. 

Theorem 5.2 ([29]). We have 

\LSn\^^"" ~ e-^n. 


There are, however, well-known upper and lower bounds which shall at least give us the abil¬ 
ity to estimate the efficiency of our algorithms, which we give below. See [291 Theorem 17.2] 
for the lower bound and see [29l the proof of Theorem 17.3] for the upper bound. 


Theorem 5.3 f[29]L We have 




n" 


< \LSr,\ < l[{k\r/^ 

k=l 


Theorem 5.4. Algorithmic samples uniformly over the set of all Latin squares of order n. 
The number of times to sample rows R 2 , ■ ■ ■, Rn-s is geometrically distributed with expected 
value 


[6-/32-/2J {[n\/e]Y-^ 

\LSJJ^\ 


n > 5. 
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Proof. We note that \LSn\ is normalized by n\ since we are taking the hrst row to be the 
permutation (1,...,n). Each of the rows i?*, i = 2,... ,n — 3, is sampled from the set of all 
hxed-point-free permutations, of which there are [n!/e] such permutations, and our sample 
space is the set of all (n—4)-tuples of fixed-point-free permutations. The proof then proceeds 
in the same fashion as the proof of Theorem I4.41 □ 

A table of values is below, calculated from the exact quantities and rounded to a few sig- 

nihcant digits; the values for \LSn\ can be found in the recent survey [28]. In addition, we 

-, / 2 

calculate the asymptotic rate of increase of Ej"' in Proposition 15.51 


n 

\LSr.\ 

En 

5 

1.61 X 10 ^ 

8.15 

6 

8.13 X 10 ® 

1.32 X 102 

7 

6.15 X 10^3 

1.10 X 10 ^ 

8 

1.09 X lO^o 

4.26 X 106 

9 

5.52 X 10^7 

8.41 X 10 ^ 

10 

9.98 X 1036 

8.79 X 10^3 

11 

7.77 X 10 ^^ 

5.03 X 1018 


Proposition 5.5. We have 


eU'^' 


~ e. 


Proof. Recall Stirling’s formula, which states that as n tends to infinity, we have 


n 


n 


n\ ~ —\/27rn, 

gn 


so 




n 


n\ ' ~ 


We then have 

El!^'" = 


[6"/32"/2J {[n\/e]f~^' 

\I^\ 


l/n^ 


6i/(3-)2i/(2-)(n!/e)i/"“3/"' nje 




nje^ 


~ e. 


□ 

These values indicate that the rejection probability is the dominating aspect of the compu¬ 
tation. See Section 1531 for other ways to improve upon this rejection cost. 


5.3 Alternative PDC parameterizations for Latin squares 

Let Un,k '■= denote the right-hand side of inequality Q, i.e., an upper bound for 

the number of completions of a /c x n Latin rectangle. 
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Algorithm [3] can be generalized with a parameter k > 2 so that rows R 2 ,... ,Rk are sampled 
in Line E] for the hrst step in PDC. In this case, the rejection probability in Line 0 may be 
replaced with d/Un,k^ and the number of times to resample has expected value 




We estimate the log of this rejection cost asymptotically as n tends to inhnity. 

Proposition 5.6. For 


• k = o{n), we have 


\og En,k ~ log(n); 


• /c ~ (1 — t) u, for any 0 < t < 1, we have 


\og En,k ~ (2t - l)n^ log(n); 


• k = n — r for any 0 < r < 1, and 0 < a < 1, we have 

\ogEn,k ~ ■'"(1 + log(n) — log(n). 


Proof. To estimate logEn^k, we note that the proof of [291 Theorem 17.3] contains an ap¬ 
proach to estimate both log \ LSn\ and log \ Un,k\] that is, we have 


log ILS'„ I ~u^log(u). 


and in a similar fashion we obtain 


log Un,k ~ niji - k) log(n - k). 


Thus, asymptotically as n and n — k tend to inhnity, we have 

logEn,k = logUn,k + {n-k) log(n!) - (n - /c - 1) - log \LSn\ 
^ n{n — k) log(u — k) + {n — k){n log(?7,)) — log(n). 

The proposition now follows by a straightforward calculation. 


□ 


There are several noteworthy aspects of the asymptotic analysis in Proposition 15.61 First, 
when k = o(n), there is asymptotically as much uncertainty in the hrst k rows as there 
are in the entire ensemble of Latin squares of order n. Next, taking k = n/2, i.e., t = 2, 
is a natural cutoh for when the uncertainty is no longer on the same exponential order 
as the entire ensemble, and indeed it is plausible that a self-similar PDC algorithm using 
this division may achieve an asymptotically optimal rejection rate. Finally, the hnal item 
in Proposition 15.61 is an example of a low-rank parameterization (see for example [1]), and 
yields the greatest immediate potential for generalizing the presented algorithms in a way 
which is still analytically tractable. 
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Of course, even with these divisions, one still has to sample from the remaining set of rows 
Rk+i, ■ ■ ■ ,Rn, which for k moderately large is still a daunting task. 

One can also apply PDC more generally, not just to k x n Latin rectangles, by hlling in any 
amount of partial information. There are many enticing partial results in this area, see m 
Theorem 17.4] for an example which hlls in an upper left rectangle, and [2^ Theorem 17.5] 
for an example which hlls in an upper triangular section of entries. There are also many 
cautionary theoretical results as well, see [29l Figure 17.4] and [3]. 

As a hnal note for this section, it is typical to reduce a Latin square by assuming the hrst 
row and the hrst column are the identity permutation. The number of such reduced Latin 
squares is then |LS'„|/(n!(n — 1)!), and it is often easier to work with this reduced set. This 
reduction could also be exploited for PDC. It is valid to replace Lines 1-3 in Algorithm [3] 
with the following: 

1: Let Ri = (1,..., n). 

2: for i = 2,..., n — 3 

3: Generate Ri uniformly from the set of hxed-point free permutations starting with i. 


This restriction forces the hrst n — 3 entries of column 1 to be the identity permutation 
(1, 2,..., n — 3) (and one could easily force the remaining entries in the column to be {n — 
2,n — l,n) ). However, for 2 < i < n — 3, the set of all hxed-point free permutations of n 
starting with an i is given by the OEIS sequence A000255 [27]. In [121 Page 373], it is shown 
that the cardinality of this set is asymptotically n\/e, which is asymptotically the same as 
the number of hxed-point free permutations of n that we have used in our current analysis. 


6 Other sampling algorithms 

6.1 The R X C generalization of a Sudoku matrix 

There is a generalization to a Sudoku matrix where the block constraint is modihed to be a 
block with dimensions R x C. The set of {R, Gj-Sudoku matrices is then dehned as the set 
otRC xRC matrices tiled with the RxC blocks, where each row, column, and RxC block 
is a permutation of {1, 2 ,..., RC}. Sudoku matrices are the special case with R = C = 3. 

In fact. Algorithm |3] is worded so that it can be applied to (i?, G)-Sudoku matrices varbatim 
using n = RC: one simply interprets Line [5] and Line [7| analogously. Even the denominator 
of the rejection probability, i.e., can be used as an upper bound on the number of 

completable {R, G)-Sudoku matrices. Of course, there is some efficiency lost in the algorithm 
by using this generic upper bound, however, as was pointed out earlier, it does not affect 
the unbiased nature of the algorithm, only the efficiency, and should more efficient bounds 
become available, one could (and should!) substitute in those bounds instead. 
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6.2 ^-permutation matrices 


A recent approach to random sampling of Latin squares and Sudoku matrices is via permu¬ 
tation matrices, see |1] ; see also [I311I1I3I]. A permutation matrix of order iV is an iV x iV 
matrix with a exactly one 1 appearing in each row and column, and the rest of the entries 
are 0. An S-permutation matrix of order iV = is an iV x iV permutation matrix with the 
additional constraint that only one 1 appears in each of the n x n sub-blocks indicated by 
the generalization in Section 16.11 for [n, n)-Sudoku matrices. Central to this idea is the fact 
that each [n, n)-Sudoku matrices A can be written uniquely as 

A = 1 ■ Pi 2 ■ P 2 • Pjsf-, (4) 

where Pi,..., P/v are each permutation matrices of order N, with the additional property 
of being disjoint, i.e., the supports of each matrix are pairwise disjoint. It was shown in [H 
Proposition 1] that the number of S'-permutation matrices is precisely n!^”. In Theorem 1 
and Theorem 2], an exact formula is given using inclusion-exclusion for the number of pairs 
of disjoint S'-permutation matrices. 

Permutation matrices were exploited in [Hj for the random sampling of Latin squares of 
order N, using the same decomposition in (jl]) but using just permutation matrices of order N 
without the additional block constraint. The sampling algorithm in [TH Section 3] generates 
a certain undirected graph whose largest cliques correspond to permutation matrices in (jl]) 
used to construct a Latin square. It is the creation of this undirected graph along with the 
hnding of all cliques which currently dominates the computation. For Latin squares of order 
n, the procedure was shown to be an effective method of uniform sampling for n <7. 


6.3 Importance sampling 

In [H], a backtracking algorithm for Sudoku matrices is presented as follows: 

• Generate the hrst row as a random permutation of {1,..., 9}. 

• Simulate each subsequent row one at a time, and left to right each entry at a time, 
with Ri^j denoting the set of elements in row i and column j which can be placed 
without violating the Sudoku conditions, by selecting an element from Ri j uniformly 
at random at each stage. 

• If for any 2 < i < 9 and 1 < j < 9 we have Rij = 0, then delete the current row and 
the preceding row, and continue. 

This algorithm was used to generate a sample size of 10® in [6]. While no quantitative bounds 
on the bias were proved, the algorithm completed quickly, always producing a valid Sudoku 
matrix. 

In [25] , a similar algorithm is championed, except if the algorithm encounters Rij = 0 for any 
i and j, the algorithm halts. The advantage this algorithm has is that one can more easily 
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keep track of the importance sampling weight, and hence establish an approximation for the 
number of Sudoku matrices as follows. If we assume that each row is a uniform permutation 
of the set {1,..., 9}, then each entry in column j of a Sudoku matrix can be one of 10 — j 
possible values, depending on which of the first j elements were placed first. Whereas in the 
algorithm we select an element uniformly from the set Rij of size \Rij\ < 10 — j, we correct 
for this via the likelihood ratio 








10 


Rij 7 ^ 0 , 


with iij = 0 if Rij = 0. The importance sampling weight is then the product of these ratios, 

i=2 j=l 

This was used to estimate the approximate number of Sudoku matrices, obtaining 6.662-10^^ 
as a point estimate. This algorithm can also be adapted to larger numerical tables, e.g., 
(u, n)-Sudoku matrices, but as the author points out, the probability of generating a feasible 
matrix, i.e., the probability that the algorithm does not halt before completion, becomes 
increasingly small. 


6.4 Markov chain approaches 

In [18] , two Markov chains on the set of Latin squares of order n are presented which involve 
rather intricate sets of transformations to transition from one state to the next, always 
yielding a valid Latin square of order n. It is proved in [THl Theorem 7] that the stationary 
distribution is indeed the uniform distribution, and that transitioning from one state to 
another requires 0{n) arithmetic operations. However, as indicated in [181 Section 6], there 
is no proof that the chain is rapidly mixing. 

A Markov chain for (n, ?7,)-Sudoku matrices was recently introduced in [15], via a connection 
with the set of binary contingency tables (the set of matrices with entries in {0,1} with 
row sums and column sums specihed), and then by finding a Markov basis, see [9]. The 
Markov basis ensures that all states are reachable, however, the technique is admittedly 
not feasible for the standard (3, 3)-Sudoku matrices, and instead it is explored using (2, 2)- 
Sudoku matrices. While an intriguing and unique approach to random sampling, there is 
also no proof that the chain is rapidly mixing. 


7 Final Remarks 


Our motivation for these algorithms is to verify a claim from [6], in which the authors 
calculated and compared the Shannon entropy of a random sample of Sudoku matrices and 
Latin squares of order 9 using a backtracking algorithm, see Section 16.31 This algorithm 
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is fast, though not necessarily unbiased, and allowed the authors to generate a sample of 
size 10® that was used in the estimation of Shannon’s entropy. For Sudoku matrices, the 
entropy was estimated as 1.73312 ± 0.000173, and for Latin squares of order 9 the entropy 
was estimated as 1.73544 ± 0.0001735. Assuming the bias in the backtracking algorithm is 
relatively small, this calculation suggests that a statistical error of at most 10“® is required in 
order to definitively distinguish a difference in entropies between the set of Sudoku matrices 
and the set of Latin squares of order 9, and so we anticipate needing an unbiased sample of 
size at least 10® to effectively distinguish between the entropies of these two sets of matrices. 

While Algorithm [2] and Algorithm [3] are provably unbiased, th^ each took approximately 24 
hours to generate a sample of size 1000 on a personal computeio (using n = 9 in Algorithm|3]). 
Using this i.i.d. sample of size 1000, an unbiased estimate for the entropy of Sudoku matrices 
is 1.73356 ± 0.0383708, and for Latin squares of order 9 our estimate for the entropy is 
1.73335 ± 0.0389771, which is consistent with the backtracking algorithm. 

We have supplied C++ source code used to generate the samples and posted it at 
https: //github. com/stephendesalvo, along with the files containing the random samples 
of size 1000, and scripts to load in the matrices into an n x n x 1000 array in Matlab and a 
1000 X n X n array in Mathematica. 
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